Initial Data Exploration

SomaLogic Data QC

Aptamer Descriptions

Table 1: List of aptamers grouped by organism and type categories, with number of aptamers per category shown in right-most column.

Organism Type Aptamer_Count
Human Protein 7289
Mouse Protein 233
Human Spuriomer 20
Human Hybridization Control Elution 12
Human Non-Biotin 10
Human Non-Cleavable 4
African clawed frog Non-Human 3
Gila monster Non-Human 3
Hornet Non-Human 3
Jellyfish Non-Human 3
Mouse Non-Human 3
Thermus thermophilus Non-Human 3
Common eastern firefly Non-Human 2
European elder Non-Human 2
Bacillus stearothermophilus Non-Human 1
Ensifer meliloti Non-Human 1
HIV-1 Protein 1
HIV-2 Protein 1
Red alga|Red alga Non-Human 1
strain K12 Non-Human 1

Table 2: Number of human protein aptamers (7289, see Table 1) per quality category. Quality refers to the success with which the human protein-designed aptamer is able to quantify protein intensity in mouse plasma samples.

Quality Count
High 2653
Low 910
Medium 3726

Proteomic Depth of Coverage

Figure 1: Plots of aptamer counts per sample for A) mouse and B) human datasets above eLoD calculated for each aptamer (eLoD = Median_blank + 4.9MAD_blank). Blue horizontal line indicate the total number of aptamers in the panel (7596). Red horizontal line (mouse only) indicates the number of human protein aptamers assessed to perform with high/medium quality in mouse samples.

Figure 2: Plots of aptamer counts per sample for A) mouse and B) human pre-normalization datasets above eLoD calculated for each aptamer (eLoD = Median_blank + 4.9MAD_blank). Blue horizontal line indicate the total number of aptamers in the panel (7596). Red horizontal line (mouse only) indicates the number of human protein aptamers assessed to perform with high/medium quality in mouse samples.

Assess Assay Variablity

Figure 3: Boxplots showing distribution of aptamer intensities by sample for mouse and human datasets, final data and pre-normalization data for each. Plots are color coded by clinical group for both datasets. Most samples in each dataset have similar intensity distributions. Calibrator samples and hemolysis samples (mouse only) show slightly wider distribution ranges. The sole AK mouse sample has a slightly lower-shifted distribution compared to other samples.

Figure 4: Bar graph showing number of aptamers removed by eLoD filter per sample for mouse datasets.

Figure 5: Boxplots showing distribution of removed aptamer counts per sample, grouped by aptamer quality in mouse. Outliers - samples with an irregular number of aptamers removed - are labeled.

Figure 6: Bar graph showing number of aptamers removed by eLoD filter per sample for human datasets.

Figure 7: Bar graph showing distribution of aptamers removed using eLoD filter, categorized by aptamer quality score in mouse system.

Figure 8: Scatter plot showing the intensity distribution for three aptamers: seq.10000.28 (high quality aptamer in mouse), seq.10044.12 (low quality), and seq.10003.15 (medium quality). The low quality aptamer has higher variation then both the medium and high quality aptamers in both the normalized and pre-normalization mouse datasets, although the difference is more apparent in the normalized dataset.

Figure 9: SomaLogic Log2 intensity boxplot distributions separated by mouse aptamer quality score. All three quality categories appear to have similar intensity distributions per sample.

Figure 10: Aptamer CV distributions for murine and human data, grouped and colored by condition. Overall, low CVs are observed in both species datasets, human having slightly lower aptamer CVs than murine.

SomaLogic Data Analysis

Mouse (Project Disney) Analysis

Assess Limit of Detection (Dilution Series) - PRE-NORMALIZATION DATA ONLY

Figure 11: P-value distributions for the wilcox tests performed on the mouse dilution series data (pre-normalization).

Figure 12: Volcano plots showing significantly differentially expressed proteins between mouse dilution series samples A) 35uL vs. 55uL, B) 35uL vs. 55uL Diluted [35uL sample, 20uL PBS], C) 55uL vs. 55uL Diluted.

Table 2: Number of aptamers/proteins detected per mouse dilution series sample (pre-normalization). The 55uL Diluted samples have the lowest counts, but all are decently high (max 7596 aptamers).

Category Sample_ID AptCount
35uL Mouse 35 uL Rep 1 7512
35uL Mouse 35 uL Rep 2 7534
35uL Mouse 35 uL Rep 3 7540
55uL Mouse A Series Rep 1 7537
55uL Mouse A Series Rep 2 7537
55uL Mouse A Series Rep 3 7550
55uL Diluted Mouse B Series Rep 1 7491
55uL Diluted Mouse B Series Rep 2 7403
55uL Diluted Mouse B Series Rep 3 7421

Figure 13: Boxplot FC distributions of murine dilution series data. Median FC is on target for expected dilutions ratio from 35 to 55µL (0.6x).

Hemolysis Comparison

Figure 14: P-value distributions for the Wilcox tests performed on the mouse data comparing intensity (RFU) values between aptamers in the Hemolysis and Pooled 55uL groups. A) Normalized mouse data, B) Pre-normalization mouse data.

Figure 15: Volcano plots showing significantly differentially expressed proteins between hemolysis and pooled 55uL groups in mouse data, A) normalized, B) pre-normalization.

Figure 16: Hemoglobin intensity results comparing each hemoglobin-targeting aptamer in the murine suspected hemolysis samples and murine pooled 55µL samples

Table_Key Aptamer_Target Target_Full_Name
1 seq.17137.160_Beta-globin Hemoglobin subunit beta
2 seq.18198.51_HBAT Hemoglobin subunit theta-1
3 seq.19774.8_HBG2 Hemoglobin subunit gamma-2
4 seq.4915.64_Hemoglobin Hemoglobin
5 seq.6919.3_HBAZ Hemoglobin subunit zeta
6 seq.6992.67_HBD Hemoglobin subunit delta
7 seq.7136.107_Hemoglobin epsilon chain Hemoglobin subunit epsilon
8 seq.7965.25_HBAT Hemoglobin subunit theta-1
9 seq.9025.5_AHSP Alpha-hemoglobin-stabilizing protein

Table 3: Hemoglobin-targeting aptamers with the key used to ID Figure 16 sub-plots with aptamer descriptions.

KMC-PDAC v. KP Late:

Reveals differentially abundant proteins in autochthonous PDAC model

Figure 17: P-value distributions for the wilcox tests performed on the mouse data comparing intensity (RFU) values between aptamers in the KMC-PDAC and Healthy Control - Late/KP Late groups. A) Normalized mouse data, B) Pre-normalization mouse data.

Figure 18: Volcano plots showing significantly differentially expressed proteins between KMC - PDAC and Healthy Control - Late/KP Late groups in mouse data, A) normalized, B) pre-normalization.

KPC-Lung v. KP Late:

Reveals differentially abundant proteins in autochthonous lung cancer model

Figure 19: P-value distributions for the Wilcox tests performed on the mouse data comparing intensity (RFU) values between proteins in the KPC-Lung and KP Late groups. A) Normalized mouse data, B) Pre-normalization mouse data.

Figure 20: Volcano plots showing significantly differentially expressed proteins between KPC-Lung and Healthy Control - Late/KP Late groups in mouse data, A) normalized, B) pre-normalization.

KMC v. KMC-Control:

Reveals differentially abundant proteins in autochthonous Myc-driven PDAC model

Figure 21: P-value distributions for the Wilcox tests performed on the mouse data comparing intensity (RFU) values between proteins in the KMC-PDAC and KMC Control groups. A) Normalized mouse data, B) Pre-normalization mouse data.

Figure 22: Volcano plots showing significantly differentially expressed proteins between KMC-PDAC and KMC Control groups in mouse data, A) normalized, B) pre-normalization.

KPC-Early v. KP Early:

Reveals differentially abundant proteins in early lethal PanIN model

Figure 23: P-value distributions for the Wilcox tests performed on the mouse data comparing intensity (RFU) values between proteins in the KPC - Early and Healthy control - early/KP-early groups. A) Normalized mouse data, B) Pre-normalization mouse data.

Figure 24: Volcano plots showing significantly differentially expressed proteins between KPC - early and Healthy control - early/KP - early groups in mouse data, A) normalized, B) pre-normalization.

KPC-Late (Lethal PanIN – Late) v. KP Late:

Reveals differentially abundant proteins in late lethal PanIN model

Figure 25: P-value distributions for the Wilcox tests performed on the mouse data comparing intensity (RFU) values between proteins in the Lethal PanIN - Late/KPC - Late and Healthy Control - Late/KP-Late groups. A) Normalized mouse data, B) Pre-normalization mouse data.

Figure 26: Volcano plots showing significantly differentially expressed proteins between Lethal PanIN - Late/KPC - Late and Healthy Control - Late/KP-Late groups in mouse data, A) normalized, B) pre-normalization.

KC-Late v. KP Late:

Reveals differentially abundant proteins in non-lethal PanIN model

Figure 27: P-value distributions for the Wilcox tests performed on the mouse data comparing intensity (RFU) values between proteins in the Non-lethal PanIN - Late/KC - Late and Healthy Control - Late/KP-Late groups. A) Normalized mouse data, B) Pre-normalization mouse data.

Figure 28: Volcano plots showing significantly differentially expressed proteins between Non-lethal PanIN - Late/KC - Late and Healthy Control - Late/KP-Late groups in mouse data, A) normalized, B) pre-normalization.

Human (Project Orion) Analysis

Assess Limit of Detection (Dilution Series) - PRE-NORMALIZATION DATA ONLY

Figure 29: P-value distributions for the wilcox tests performed on the human dilution series data (pre-normalization).

Figure 30: Volcano plots showing significantly differentially expressed proteins between human dilution series samples A) 35uL vs. 55uL, B) 35uL vs. 55uL Diluted [35uL sample, 20uL PBS], C) 55uL vs. 55uL Diluted.

Table 4: Number of aptamers/proteins detected per human dilution series sample (pre-normalization). The 55uL Diluted samples have the lowest counts, but all are decently high (max 7596 aptamers).

Category Sample_ID AptCount
35uL Human 35 uL Rep 1 7553
35uL Human 35 uL Rep 2 7558
35uL Human 35 uL Rep 3 7561
55uL Human A Series Rep 1 7566
55uL Human A Series Rep 2 7576
55uL Human A Series Rep 3 7577
55uL Diluted Human B Series Rep 1 7503
55uL Diluted Human B Series Rep 2 7482
55uL Diluted Human B Series Rep 3 7504

Figure 31: Boxplot FC distributions of human dilution series data. Median FC is on target for expected dilutions ratio from 35 to 55µL (0.6x).

Case (Adenocarcinoma) v. All Controls

Figure 32: P-value distributions for the Wilcox tests performed on the human data comparing intensity (RFU) values between proteins in the Case (Adenocarcinoma) and All Controls groups. A) Normalized human data, B) Pre-normalization human data.

Figure 33: Volcano plots showing significantly differentially expressed proteins between Case (Adenocarcinoma) and All Controls groups in human data, A) normalized, B) pre-normalization.

Case (Adenocarcinoma) v. Control (Benign)

Figure 34: P-value distributions for the Wilcox tests performed on the human data comparing intensity (RFU) values between proteins in the Case (Adenocarcinoma) and Control (Benign) groups. A) Normalized human data, B) Pre-normalization human data.

Figure 35: Volcano plots showing significantly differentially expressed proteins between Case (Adenocarcinoma) and Control (Benign) groups in human data, A) normalized, B) pre-normalization.

Case (Adenocarcinoma) v. Control (Benign Prostatic Hyperplasia)

Figure 36: P-value distributions for the Wilcox tests performed on the human data comparing intensity (RFU) values between proteins in the Case (Adenocarcinoma) and Control (Benign Prostatic Hyperplasia) groups. A) Normalized human data, B) Pre-normalization human data.

Figure 37: Volcano plots showing significantly differentially expressed proteins between Case (Adenocarcinoma) and Control (Benign Prostatic Hyperplasia) groups in human data, A) normalized, B) pre-normalization.

Case (Adenocarcinoma) v. Control (Prostatic intraepithelial neoplasia)

Figure 38: P-value distributions for the Wilcox tests performed on the human data comparing intensity (RFU) values between proteins in the Case (Adenocarcinoma) and Control (Prostatic Intraepithelial Neoplasia) groups. A) Normalized human data, B) Pre-normalization human data.

Figure 39: Volcano plots showing significantly differentially expressed proteins between Case (Adenocarcinoma) and Control (Prostatic Intraepithelial Neoplasia) groups in human data, A) normalized, B) pre-normalization.

Somalogic vs. Seer Proteograph

Protein ID Overlap & Fold Change Comparison - KMC-PDAC v. KMC Control

## [1] "Starting mapping..."
## [1] "Not all UniProts submitted are valid in NCBI (EntrezID) DB. Proceeding with pivot for valid UniProts only."
## [1] "Human uniprots mapped to EntrezID..."
## [1] "Mouse --> Human Entrez ID Orthology Match performed..."
## [1] "All annotation columns successfully added to NCBI-Mapped DF..."
## [1] "UniProt Nomenclature Conversion: Human --> Mouse Complete. KEGG Orthology DB used to address gaps in NCBI DB."
## [1] "Starting mapping..."
## [1] "Not all uniprots submitted are valid in NCBI (EntrezID) DB. Proceeding with pivot for valid uniprots only."
## [1] "Mouse uniprots mapped to EntrezID..."
## [1] "Mouse --> Human Entrez ID Orthology Match performed..."
## [1] "All annotation columns successfully added to NCBI-Mapped DF..."
## [1] "UniProt Nomenclature Conversion: Mouse --> Human Complete. KEGG Orthology DB used to address gaps in NCBI DB."

Figure 40: Figures summarizing the results of cross-platform analysis for the murine KMC-PDAC and KMC control data.

Protein ID Overlap & Fold Change Comparison - Human Case v. All Control

Figure 41: Figures summarizing the results of cross-platform analysis for the human Case v. Control comparison data.

Raw data Method Correlation - Murine

## [1] "Starting mapping..."
## [1] "Not all UniProts submitted are valid in NCBI (EntrezID) DB. Proceeding with pivot for valid UniProts only."
## [1] "Human uniprots mapped to EntrezID..."
## [1] "Mouse --> Human Entrez ID Orthology Match performed..."
## [1] "All annotation columns successfully added to NCBI-Mapped DF..."
## [1] "UniProt Nomenclature Conversion: Human --> Mouse Complete. KEGG Orthology DB used to address gaps in NCBI DB."
## [1] "Starting mapping..."
## [1] "Not all uniprots submitted are valid in NCBI (EntrezID) DB. Proceeding with pivot for valid uniprots only."
## [1] "Mouse uniprots mapped to EntrezID..."
## [1] "Mouse --> Human Entrez ID Orthology Match performed..."
## [1] "All annotation columns successfully added to NCBI-Mapped DF..."
## [1] "UniProt Nomenclature Conversion: Mouse --> Human Complete. KEGG Orthology DB used to address gaps in NCBI DB."

Figure 42: Figures summarizing the results of cross-platform analysis for the murine KMC-PDAC and KMC control Log2 intensity and Log2 ZScore intensity data.

Raw data Method Correlation - Human

Figure 43: Figures summarizing the results of cross-platform analysis for the human Case v. Control Log2 intensity and Log2 ZScore intensity data.

Figure 44: Histograms showing distribution of SomaLogic and Seer Log2 and Log2 ZScore intensity data for murine KMC-PDAC and KMC control, and human Case and Control.

Table 5: Unique protein counts for each platform (SomaLogic and Seer MS) for the two main studies compared here (murine KMC-PDAC and KMC Control, human Case and Control) and the number of overlapping protein IDs between platforms in each study. Seer protein list is filtered to only keep single-protein IDs (no protein groups) for most analogous comparison to single proteins targeted by SomaLogic aptamers.

Study Seer Protein Count Initial Seer Protein Count 50% Sparsity Seer Protein Count w/o Protein Groups SomaLogic Protein Count (pre-eLoD) SomaLogic Protein Count (post-eLoD) Overlap Protein Count
Human Prostate Cancer Case v. Control 1545 424 80 6432 6422 50
Murine KMC-PDAC v. KMC Control 7526 4085 1352 6432 6412 478